113 research outputs found

    Self-Supervised Pre-training for 3D Point Clouds via View-Specific Point-to-Image Translation

    Full text link
    The past few years have witnessed the great success and prevalence of self-supervised representation learning within the language and 2D vision communities. However, such advancements have not been fully migrated to the field of 3D point cloud learning. Different from existing pre-training paradigms designed for deep point cloud feature extractors that fall into the scope of generative modeling or contrastive learning, this paper proposes a translative pre-training framework, namely PointVST, driven by a novel self-supervised pretext task of cross-modal translation from 3D point clouds to their corresponding diverse forms of 2D rendered images. More specifically, we begin with deducing view-conditioned point-wise embeddings through the insertion of the viewpoint indicator, and then adaptively aggregate a view-specific global codeword, which can be further fed into subsequent 2D convolutional translation heads for image generation. Extensive experimental evaluations on various downstream task scenarios demonstrate that our PointVST shows consistent and prominent performance superiority over current state-of-the-art approaches as well as satisfactory domain transfer capability. Our code will be publicly available at https://github.com/keeganhk/PointVST

    PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal Distillation for 3D Shape Recognition

    Full text link
    As two fundamental representation modalities of 3D objects, 3D point clouds and multi-view 2D images record shape information from different domains of geometric structures and visual appearances. In the current deep learning era, remarkable progress in processing such two data modalities has been achieved through respectively customizing compatible 3D and 2D network architectures. However, unlike multi-view image-based 2D visual modeling paradigms, which have shown leading performance in several common 3D shape recognition benchmarks, point cloud-based 3D geometric modeling paradigms are still highly limited by insufficient learning capacity, due to the difficulty of extracting discriminative features from irregular geometric signals. In this paper, we explore the possibility of boosting deep 3D point cloud encoders by transferring visual knowledge extracted from deep 2D image encoders under a standard teacher-student distillation workflow. Generally, we propose PointMCD, a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student. To perform heterogeneous feature alignment between 2D visual and 3D geometric domains, we further investigate visibility-aware feature projection (VAFP), by which point-wise embeddings are reasonably aggregated into view-specific geometric descriptors. By pair-wisely aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification. Experiments on 3D shape classification, part segmentation, and unsupervised learning strongly validate the effectiveness of our method. The code and data will be publicly available at https://github.com/keeganhk/PointMCD

    Bidirectional Propagation for Cross-Modal 3D Object Detection

    Full text link
    Recent works have revealed the superiority of feature-level fusion for cross-modal 3D object detection, where fine-grained feature propagation from 2D image pixels to 3D LiDAR points has been widely adopted for performance improvement. Still, the potential of heterogeneous feature propagation between 2D and 3D domains has not been fully explored. In this paper, in contrast to existing pixel-to-point feature propagation, we investigate an opposite point-to-pixel direction, allowing point-wise features to flow inversely into the 2D image branch. Thus, when jointly optimizing the 2D and 3D streams, the gradients back-propagated from the 2D image branch can boost the representation ability of the 3D backbone network working on LiDAR point clouds. Then, combining pixel-to-point and point-to-pixel information flow mechanisms, we construct an bidirectional feature propagation framework, dubbed BiProDet. In addition to the architectural design, we also propose normalized local coordinates map estimation, a new 2D auxiliary task for the training of the 2D image branch, which facilitates learning local spatial-aware features from the image modality and implicitly enhances the overall 3D detection performance. Extensive experiments and ablation studies validate the effectiveness of our method. Notably, we rank 1st\mathbf{1^{\mathrm{st}}} on the highly competitive KITTI benchmark on the cyclist class by the time of submission. The source code is available at https://github.com/Eaphan/BiProDet.Comment: Accepted by ICLR2023. Code is avaliable at https://github.com/Eaphan/BiProDe

    GLENet: Boosting 3D Object Detectors with Generative Label Uncertainty Estimation

    Full text link
    The inherent ambiguity in ground-truth annotations of 3D bounding boxes caused by occlusions, signal missing, or manual annotation errors can confuse deep 3D object detectors during training, thus deteriorating the detection accuracy. However, existing methods overlook such issues to some extent and treat the labels as deterministic. In this paper, we formulate the label uncertainty problem as the diversity of potentially plausible bounding boxes of objects, then propose GLENet, a generative framework adapted from conditional variational autoencoders, to model the one-to-many relationship between a typical 3D object and its potential ground-truth bounding boxes with latent variables. The label uncertainty generated by GLENet is a plug-and-play module and can be conveniently integrated into existing deep 3D detectors to build probabilistic detectors and supervise the learning of the localization uncertainty. Besides, we propose an uncertainty-aware quality estimator architecture in probabilistic detectors to guide the training of IoU-branch with predicted localization uncertainty. We incorporate the proposed methods into various popular base 3D detectors and demonstrate significant and consistent performance gains on both KITTI and Waymo benchmark datasets. Especially, the proposed GLENet-VR outperforms all published LiDAR-based approaches by a large margin and ranks 1st1^{st} among single-modal methods on the challenging KITTI test set. We will make the source code and pre-trained models publicly available

    Multilevel Perception Boundary-guided Network for Breast Lesion Segmentation in Ultrasound Images

    Full text link
    Automatic segmentation of breast tumors from the ultrasound images is essential for the subsequent clinical diagnosis and treatment plan. Although the existing deep learning-based methods have achieved significant progress in automatic segmentation of breast tumor, their performance on tumors with similar intensity to the normal tissues is still not pleasant, especially for the tumor boundaries. To address this issue, we propose a PBNet composed by a multilevel global perception module (MGPM) and a boundary guided module (BGM) to segment breast tumors from ultrasound images. Specifically, in MGPM, the long-range spatial dependence between the voxels in a single level feature maps are modeled, and then the multilevel semantic information is fused to promote the recognition ability of the model for non-enhanced tumors. In BGM, the tumor boundaries are extracted from the high-level semantic maps using the dilation and erosion effects of max pooling, such boundaries are then used to guide the fusion of low and high-level features. Moreover, to improve the segmentation performance for tumor boundaries, a multi-level boundary-enhanced segmentation (BS) loss is proposed. The extensive comparison experiments on both publicly available dataset and in-house dataset demonstrate that the proposed PBNet outperforms the state-of-the-art methods in terms of both qualitative visualization results and quantitative evaluation metrics, with the Dice score, Jaccard coefficient, Specificity and HD95 improved by 0.70%, 1.1%, 0.1% and 2.5% respectively. In addition, the ablation experiments validate that the proposed MGPM is indeed beneficial for distinguishing the non-enhanced tumors and the BGM as well as the BS loss are also helpful for refining the segmentation contours of the tumor.Comment: 12pages,5 figure

    NeuroGF: A Neural Representation for Fast Geodesic Distance and Path Queries

    Full text link
    Geodesics are essential in many geometry processing applications. However, traditional algorithms for computing geodesic distances and paths on 3D mesh models are often inefficient and slow. This makes them impractical for scenarios that require extensive querying of arbitrary point-to-point geodesics. Although neural implicit representations have emerged as a popular way of representing 3D shape geometries, there is still no research on representing geodesics with deep implicit functions. To bridge this gap, this paper presents the first attempt to represent geodesics on 3D mesh models using neural implicit functions. Specifically, we introduce neural geodesic fields (NeuroGFs), which are learned to represent the all-pairs geodesics of a given mesh. By using NeuroGFs, we can efficiently and accurately answer queries of arbitrary point-to-point geodesic distances and paths, overcoming the limitations of traditional algorithms. Evaluations on common 3D models show that NeuroGFs exhibit exceptional performance in solving the single-source all-destination (SSAD) and point-to-point geodesics, and achieve high accuracy consistently. Moreover, NeuroGFs offer the unique advantage of encoding both 3D geometry and geodesics in a unified representation. Code is made available at https://github.com/keeganhk/NeuroGF/tree/master

    A new smart mobile system for chronic wound care management

    Get PDF
    Nonhealing wounds pose a major challenge in clinical medicine. Typical chronic wounds, such as diabetic foot ulcers and venous leg ulcers, have brought substantial difficulties to millions of patients around the world. The management of chronic wound care remains challenging in terms of precise wound size measurement, comprehensive wound assessment, timely wound healing monitoring, and efficient wound case management. Despite the rapid progress of digital health technologies in recent years, practical smart wound care management systems are yet to be developed. One of the main difficulties is in-depth communication and interaction with nurses and doctors throughout the complex wound care process. This paper presents a systematic approach for the user-centered design and development of a new smart mobile system for the management of chronic wound care that manages the nurse's task flow and meets the requirements for the care of different types of wounds in both clinic and hospital wards. The system evaluation and satisfaction review was carried out with a group of ten nurses from various clinical departments after using the system for over one month. The survey results demonstrated high effectiveness and usability of the smart mobile system for chronic wound care management, in contrast to the traditional pen-and-paper approach, in busy clinical contexts

    A study on the appropriate dose of rocuronium for intraoperative neuromonitoring in Da Vinci robot thyroid surgery: a randomized, double-blind, controlled trial

    Get PDF
    BackgroundThis study was to explore the effect of different doses of rocuronium bromide on neuromonitoring during Da Vinci robot thyroid surgery.MethodsThis was a prospective, randomized, double-blind, controlled trial that included 189 patients who underwent Da Vinci robot thyroidectomy with intraoperative neuromonitoring(IONM). Patients were randomly divided into three groups and given three different doses of rocuronium (0.3mg/kg, 0.6mg/kg, 0.9mg/kg). Outcome measurements included IONM evoked potential, postoperative Voice Handicap Index-30(VHI-30), intraoperative body movement incidence rate, Cooper score, and hemodynamic changes during anesthesia induction.Results: The difference in IONM evoked potentials at various time points between the three groups was not statistically significant (P>0.05). The difference in Cooper scores and intraoperative body movement incidence rate between 0.6 and 0.9mg/kg groups was statistically significant compared with the 0.3mg/kg group (both P<0.001). There was no statistically significant difference in VHI-30 score and hemodynamic changes during anesthesia induction among the three groups (both P>0.05).ConclusionsFor patients undergoing Da Vinci robot thyroidectomy, a single dose of rocuronium at 0.6 and 0.9mg/kg during anesthesia induction can provide stable IONM evoked potential. Additionally, compared to 0.3 mg/kg, it can offer better tracheal intubation conditions and lower incidence of body movements during surgery. It is worth noting that the use of higher doses of rocuronium should be adjusted based on the duration of IONM and local practices
    corecore